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Abstract — Internet worm attacks pose a significant threat to network security and management. In this work, we coin the term Internet 
worm tomography as inferring the characteristics of Internet worms from the observations of Darknet or network telescopes that 
monitor a routable but unused IP address space. Under the framework of Internet worm tomography, we attempt to infer Internet worm 
temporal behaviors, i.e., the host infection time and the worm infection sequence, and thus pinpoint patient zero or initially infected hosts. 
Specifically, we introduce statistical estimation techniques and propose method of moments, maximum likelihood, and linear regression 
estimators. We show analytically and empirically that our proposed estimators can better infer worm temporal characteristics than a 
naive estimator that has been used in the previous work. We also demonstrate that our estimators can be applied to worms using 
different scanning strategies such as random scanning and localized scanning. 

Index Terms — Internet worm tomography, Darknet, statistical estimation, host infection time, worm infection sequence. 
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1 Introduction 

SINCE Code Red and Nimda worms were released in 
2001, epidemic-style attacks have caused severe dam- 
ages. Internet worms can spread so rapidly that existing 
defense systems cannot respond until most vulnerable hosts 
have been infected. For example, on January 25th, 2003, 
the Slammer worm reached its maximum scanning rate of 
more than 55 million scans per second in about 3 minutes, 
and infected more than 90% of vulnerable machines within 
10 minutes (T). It cost over one billion US dollars in 
cleanup and economic damages. Therefore, worm attacks 
pose significant threats to the Internet and meanwhile 
present tremendous challenges to the research community. 

To counteract these notorious plague-tide attacks, various 
detection and defense strategies have been studied in recent 
years. According to where the detectors are located, these 
strategies can generally be classified into three categories: 
source detection and defenses, detecting infected hosts in the 
local networks Q, |3), H), |5); middle detection and defenses, 
revealing the appearance of worms by analyzing the traffic 
going through routers 03, Q, |8); and destination detection 
and defenses, monitoring unwanted traffic arriving at Dark- 
net or network telescopes, a globally routable address space 
where no active services or servers reside (9), ITDl , (Til , 
fl2l , fl3l . There are two types of Darknet: active Darknet 
that responds to malicious scans to elicit the payloads of 
the attacks fill , fl"2l , and passive Darknet that observes 
unwanted traffic passively (TOl , fl3l . 

Different from source and middle detection and defenses, 
destination detection and defenses offer unique advantages 
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in observing large-scale network explosive events such 
as distributed denial-of-service (DDoS) attacks [14J and 
Internet worms fTBI , 111 . Il6l . There is no legitimate reason 
for packets destined to Darknet. Hence, most of the traffic 
arriving at Darknet is malicious or unintended, including 
hostile reconnaissance scans, probe activities from active 
worms, DDoS backscatter, and packets from mis-configured 
hosts. Moreover, it has been shown that for a large-scale 
worm event, most of infected hosts, if not all, can be 
observed by the Darknet with a sufficiently large size fTTI . 

In this work, we focus on the destination detection and 
defenses. Specifically, we study the problem of inferring 
the characteristics of Internet worms from Darknet obser- 
vations. We refer to such a problem as Internet worm tomog- 
raphy, as illustrated in FigUJ Most worms use scan-based 
methods to find vulnerable hosts and randomly generate 
target IP addresses. Thus, Darknet can observe partial scans 
from infected hosts. Together with the worm propagation 
model and the statistical model, Darknet observations can 
be used to detect worm appearance fl8l . fl9l , l20l , fzTI and 
infer worm characteristics {e.g., infection rate l22l . number 
of infected hosts Il7l . l23l , and worm infection sequence 
l24l , l25l , l26l ). Internet worm tomography is named after 
network tomography, which infers the characteristics of the 
internal network {e.g., link loss rate, link delay, and topol- 
ogy) through the observations from end systems l27l , l28l . 
Network tomography can be formulated as a linear inverse 
problem. Internet worm tomography, however, cannot be 
translated into the linear inverse problem due to the specific 
properties of worm propagation, and thus presents new 
challenges. 

Under the framework of Internet worm tomography, 
researchers have studied worm temporal characteristics 
and have attempted to answer the following important 
questions: 

• Host infection time: When exactly does a specific host 
get infected? This information is critical for the recon- 
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Fig. 1. Internet worm tomography. 

struction of the worm infection sequence 

• Worm infection sequence: What is the order in which 
hosts are infected by worm propagation? Such an 
order can help identify patient zero or initially infected 
hosts [24 J. 

The information of both the infection time and the infec- 
tion sequence is important for defending against worms. 
First, the identification of patient zero or initially infected 
hosts and their infection times provide forensic clues for 
law enforcement against the attackers who wrote and 
spread the worm. Second, the knowledge of the infection 
sequence provides insights into how a worm spread across 
the Internet (e.g., characteristics on who infected whom) 
and how network defense systems were breached. 

A simple estimator has been proposed in t25l to infer 
worm temporal behaviors. The estimator uses the observa- 
tion time when an infected host scans the Darknet for the 
first time as the approximation of the host infection time to 
infer the worm infection sequence. Such a naive estimator, 
however, does not fully exploit all information obtained 
by the Darknet. Moreover, an attacker can design a smart 
worm that uses lower scanning rates for patient zero or 
initially infected hosts and higher scanning rates for other 
infected hosts. In this way, the smart worm would weaken 
the performance of the naive estimator. 

The goal of this paper is to infer the Internet worm 
temporal characteristics accurately by exploiting Darknet 
observations and applying statistical estimation techniques. 
Our research work makes several contributions: 

• We propose method of moments, maximum likelihood, 
and linear regression statistical estimators to infer the 
host infection time. We show analytically and empir- 
ically that the mean squared error of our proposed 
estimators can be almost half of that of the naive 
estimator in inferring the host infection time. 

• We extend our proposed estimators to infer the worm 
infection sequence. Specifically, we formulate the prob- 
lem of estimating the worm infection sequence as a 
detection problem and derive the probability of error 
detection for different estimators. We demonstrate an- 
alytically and empirically that our method performs 
much better than the algorithm proposed in |25l . 
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Fig. 2. An illustration of Darknet observations. 

• We show empirically that our estimators have a better 
performance in identifying patient zero or initially 
infected hosts of the smart worm than the naive esti- 
mator. We also demonstrate that our estimators can be 
applied to worms using different scanning strategies 
such as random scanning and localized scanning. 
The remainder of this paper is organized as follows. Sec- 
tion [2] introduces estimators for inferring the host infection 
time. Section [3] presents our algorithms in estimating the 
worm infection sequence. Section [4] gives simulation results. 
Section [5] discusses the assumptions, the limitations, and 
the extensions of our estimators. Finally, Section [6] reviews 
related work, and Section concludes the paper. 

2 Estimating the Host Infection Time 

We use Darknet observations to estimate when a host gets 
infected and use hit to denote the event that a worm scan 
hits the Darknet. As shown in Fig. |5J suppose that a certain 
host is infected at time in.- The Darknet monitors a portion 
of the IPv4 address space and can observe some scans 
from this host and record hit times ti, ti, • • • , t n , where n 
is the number of hit events from this host. The problem 
of estimating the host infection time can then be stated as 
follows: Given the Darknet observations t\, t 2 , ■ ■ ■ ,t n , what 
is the best estimate of in,? 

To study this problem, we make the following assump- 
tions: 1) There is no packet loss in the Internet. 2) An 
infected host uses its actual source IP address and does 
not apply IP spoofing, which is the case for TCP worms. 
3) The scanning rate s (i.e., the number of scans sent by an 
infected host per time unit) is time-invariant for an infected 
host, whereas the scanning rates of infected hosts can be 
different from each other. The last assumption comes from 
the observation that famous worms, such as Code Red, 
Nimda, Slammer, and Witty, do not apply any scanning 
rate variation mechanisms. An infected host always scans 
for vulnerable hosts at the maximum speed allowed by 
its computing resources and network conditions l29l . In 
Section |5j we will revisit and discuss these assumptions. 

Obviously, inferring to from Darknet observations is 
affected by the Internet-worm scanning methods. In this pa- 
per, we focus on random scanning and localized scanning. 
However, if a scan from an infected host hits Darknet with 
a time-invariant probability, our estimation techniques are 
independent of worm-scanning methods. To analytically 



estimate the host infection time, we consider a discrete- 
time system. For random scanning (RS), a worm selects 
targets randomly and scans the entire IPv4 address space 
with il addresses {i.e., il = 2 32 ). We assume that Darknet 
monitors u> addresses. Thus, the probability for a scan to 
hit the Darknet is ui/fl; and the probability of a hit event in 
the discrete-time system (i.e., the probability that Darknet 
observes at least one scan from the same infected host in a 
time unit) is 



Pr RS (hit event) = 1 - (l 



n) 



(i) 



Since s is time-invariant for a given infected host, 
Pr RS (hit event) is also time-invariant. 

Localized scanning (LS) preferentially searches for vul- 
nerable hosts in the "local" address space 1301 . For simplic- 
ity, in this paper we only consider the /I LS: p a (0 < p a < 1) 
of the time, a "local" address with the same first I bits 
as the attacking host is chosen as the target; 1 — p a of 
the time, a random address is chosen. We consider a 
centralized Darknet that occupies a continuous address 
space and monitors ui addresses. Moreover, we assume that 
the Darknet is contained in a // prefix with no vulnerable 
hosts. For example, network telescopes used by CAIDA are 
such a centralized Darknet and contain a /8 subnet. Since 
no infected hosts exist in the /I subnet where the Darknet 
resides, the probability for a worm scan to hit the Darknet 
is (1 — Pa) ■ uj/VI. Therefore, the probability of a hit event 
in the discrete-time system is 



Pr LS (hit event) = 1 - ( 1 - (1 - p a ) ■ - J , 



(2) 



which is time-invariant. Since Pr RS (hit event) has a sim- 
ilar form as Pr LS (hit event) and is the special case of 
Pr LS (hit event) when p a = 0, we use p (0 < p < 1) to denote 
the hit probability in general for both cases to simplify our 
discussion. 

Denote <5n as the time interval between when a host gets 
infected and when Darknet observes the first scan from 
this host, i.e., <5n = t\ — in, as shown in Fig. [2] Denote 5i 
as the time interval between i-th hit and (i + l)-th hit on 
Darknet, i.e., 5i = ij+i — £;, i > 1. Thus, 5q, 5\, ■ ■ ■ , 5 n -i are 
independent and identically distributed (i.i.d.) and follow 
a geometric distribution with parameter p, i.e., 



Pr(5 = fc) =p- {l-pf 



fc-i 



1,2,3,. 



E(<5) = - = /i, 
P 



Var(<5) = ^-/. 
pi 



(4) 



Denote fi as the mean value of S and ft as the estimate of 
fi. We then estimate to by subtracting ft from t\, i.e., 



tn — t] 



/'■ 



(5) 



Therefore, our problem is reduced to estimating u. Table [T] 
summarizes the notations used in this paper. 
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Notations used in this paper. 
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Size of the scanning space (Q 
Size of the Darknet 
Scanning rate (scans /time unit) 
Standard deviation of the scanning rate 
Probability that an address with the same first I bits as 
the attacking host is chosen by LS 
Probability that at least one scan from the same infected 
host hits the Darknet in a time unit 
Host infection time 
Estimated host infection time 

Discrete time tick when the infected host hits the Dark- 
net for the i-th time (i > 1) 

Time interval between two consecutive hits of the Dark- 
net (Si = t i+1 -U,i> 1) 

Number of hit events observed at the Darknet for an 
infected host 
Mean of <5 
Estimation of fi 



D Sequence distance 

Si Worm infection sequence 

Si Estimated worm infection sequence 

N Length of the worm infection sequence considered for 

evaluation 



2.1 Naive Estimator 

Since 5 follows the geometric distribution as described by 
Equation {3), Pr(<5) is maximized when 5 = 1. Then, a naive 
estimator (NE) of /i is 



A*NE 



Thus, the NE of to is 



^ONE — ^1 — /*NE — £l — 1. 



(6) 



(7) 



Note that t 0NE depends only on t\, but not on < 2 , t 3 , • • • , t n . 
This estimator has been used in [25] to infer the host 
infection time and the worm infection sequence. In this 
paper, however, we consider more advanced estimation 
methods. 

2.2 Method of Moments Estimator 

Since E(<5) = [i, we design a method of moments estimator 
(MME), i.e., 

n-l 



Mmme — — — — / Oi — — — 

n — 1 r—r n — 1 



(3) Thus, the MME of t is 



^0 MME — ^1 /^MME — tl 



tn — t\ 

n-l 



(8) 



(9) 



Note that £o M me is not only related to t\, but also to n and 



t, 



2.3 Maximum Likelihood Estimator 

Rewrite the probability mass function of <5 in Equation l|3} 
with respect to /x, 

?r(8;n) = Ul-±) S ~ 1 ,5=l,2,3,---. (10) 



Since 81,62,' •• ,8 n -i are i.i.d., the likelihood function is 
given by the following product 

ra-l 



(=1 



- y (^m)*" 1 • (n) 

We then design a maximum likelihood estimator (MLE), i.e., 

(12) 



fl MLE = argmaxL(/x). 



Rather than maximizing L(u), we choose to maximize its 
logarithm lnL(/i). That is, 



—InL(u) = 



(h 



^-^ r _ t n ~ ti 

~ n- 1-H * ~ n-1 ' 



which has the same expression as the MME. Thus, 



ft 



Omle 



SI U MLE — SI 



n—1 



(13) 



(14) 



(15) 



2.4 Linear Regression Estimator 

Under the assumption that the scanning rate of an individ- 
ual infected host is time-invariant, the relationship between 
ti and i can be described by a linear regression model as 
illustrated in Fig. |3j i.e., 



t, 



a + p-i + Ei, 



(16) 



where a and j3 are coefficients, and Si is the error term. To 
fit the observation data, we apply the least squares method 
to adjust the parameters of the model. That is, we choose 
the coefficients that minimize the residual sum of squares 
(RSS) 

n 

12 



RSS = £[*<- (a + 0-»)] S 



(17) 



i=i 



The minimum RSS occurs when the partial derivatives with 
respect to the coefficients are zero 

( dRSS 
da 

dRSS 
^ dp 

which leads to 



-2^2(ti-a-P-i) = 
j=i 

n 

-2^i-(ti-a-{3-i) = 0, 

i=l 



(18) 



a = t — ■ i 
- i-t — i-t 



(19) 



df 



where the bar symbols denote the average values 
( 1 n i n 

n *—* n f— i 

-1 n -1 ft 

t=-Vij, Ut= -Vi-i,. 

71 z ' 11 z ' 



(20) 
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Fig. 3. Linear regression model. 

TABLE 2 
Comparison of estimator properties (/}). 
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Bias(/2) 


Var(/t) 




MSE(/i) 


Ane — 1 


1-i 
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-J>) 
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(i-p) 2 
1-p 


,". i-t — i-t 
JiLRE — -2 _ a 


p^(n-l) 
6(n 2 + l)(l 
5n(n 2 -i; 


p^(ra-l) 
6(n 2 +l)(l-p) 
5n(n-^ — ljp 2 



We then design a //near regressioit estimator (LRE), i.e., 

Alre = /? = h - t . (21) 

Thus, the LRE of to is 

i-t — i-t 



Wre — £l — Ml 



ti -=- 



(z)^ 



(22) 



There is another way to estimate to, which uses the point 
of interception shown in Fig. [3] as the estimation of to, i.e., 



to LRE — ci — t — fi L 



(23) 



However, we find that the mean squared error of £o L re 
increases when n increases. That is, the performance of 
the estimator worsens with the increasing number of hits, 
which makes this estimator undesirable. 

2.5 Comparison of Estimators 

To compare the performance of the naive estimator and our 
proposed estimators, we compute the bias, the variance, 
and the mean squared error (MSE). For estimating \i, 



Bias(/i) = E(/i) — /i 
Var(/2) = E[(/i-E(/i)) 2 ] 
MSE(/i) = E [(/1 - u) 2 ] = Bias 2 (A) 



Var(£). 



(24) 



Here, the bias denotes the average deviation of the estimator 
from the true value; the variance indicates the distance 
between the estimator and its mean; and the MSE char- 
acterizes the closeness of the estimated value to the true 
value. A smaller MSE indicates a better estimator. Table |2] 
summarizes the results of NE, MME (or MLE), and LRE 
for estimating u. The details of the derivations of Table 



TABLE 3 
Comparison of estimator properties (to). 



to 



Bias(fo) 



Var(to) 



MSEfa) 



(l-p)(2-p) - 

—p 
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1-p i 5n J +6n 
P 2 
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■ Mmme 



/^LRE 



1-P 
P 






1-p 

1— p _ n 

p 2 n— 1 
1-p 5re 3 +6re 2 -5re+6 



5n(r 



-5ra+6 



5n(n 2 -l) 



2(1-P) 

1-P 
P 2 ' 

1-P 
P 2 : 



when p<l) 
when n ~S> 1) 
when 7i ^> 1) 



[2] are given in Appendix A. It is noted that MME and 
LRE are unbiased, while NE is biased. Moreover, MME 
and LRE have a smaller MSE than NE if n > 2 and 
p < 0.5, a condition that is usually satisfied. Specifically, 
when n — > oo, MSE(/i MME ) — >• and MSE(/i LRE ) — > 0, but 
MSE(/i NE ) ->• (1 - p) 2 /p 2 . It is also observed that MME is 
slightly better than LRE in terms of MSE when n > 2. 

Similarly we compute the bias, the variance, and the 
MSE of the estimators for estimating to in Table [3] The 
details of the derivations of Tableware given in Appendix 
B. We also observe that MME (or MLE) and LRE are 
unbiased, whereas NE is biased. Moreover, MSE(io MME ) an d 
MSE(to LRE ) are smaller than MSE(t 0NE ), and MSE(t 0MME ) is 
the smallest when n > 3 and p < 0.5. Specifically, in 
practice, Darknet only covers a relatively small portion of 
the IPv4 address space (i.e., lj -c Q), which leads to p <C 1. 
Thus, we have the following theorem: 

Theorem 1: When the Darknet observes a sufficient num- 
ber of hits (i.e., n ~^> 1) and p -c 1, 
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fn r m i 



— t — \- 



— • — • — »~ Observed 
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(25) 



Fig. 4. A scenario of the worm infection sequence. 



estimators are applied, it is possible to obtain t 0A < t 0B and 
thus recover the real infection sequence. 



3.2 Performance Analysis 

To analytically show that our estimators are more accurate 
than the naive estimator in estimating the worm infection 
sequence, we formulate the problem as a detection problem. 
Specifically, in Fig. HJ suppose that host B is infected after 
host A (i.e., t 0A < t 0B ). If t 0A < t 0B/ we call it "success" de- 
tection; otherwise, if t 0A > t 0B , we call it "error" detection]]. 
We intend to calculate the probability of error detection for 
different estimators. 

Note that 5 0A = t 1A — t 0A and 5 m = t 1B — t 0B follow the 
geometric distribution (i.e., Equation lO) with parameter 
p A and p B , respectively. Here, p A (or p B ) is the probability 
that at least one scan from host A (or B) hits the Darknet in 
a time unit and follows Equation {l) for random scanning 
and Equation 10 for localized scanning. Moreover, p A (or 
p B ) depends on s A (or s B ) so that if s A < s B , then p A < p E . 
Since w <C O, we have p A <C 1 and p B <C 1. Hence, for 
simplicity we use the continuous-time analysis and apply 
the exponential distribution to approximate the geometric 
distribution for S 0A and <5 0B t3TI , i.e., 



f(x;X) = 



Xe- Xx 
0. 



That is, the MSE of our proposed estimators is almost half 
of that of the naive estimator. That is, our proposed esti- 
mators are nearly twice as accurate as the naive estimator 
in estimating the host infection time. 

3 Estimating the Worm Infection Se- 
quence 

In this section, we extend our proposed estimators for 
inferring the worm infection sequence. 

3.1 Algorithm 

Our algorithm is that we first estimate the infection time 
of each infected host. Then, we reconstruct the infection 
sequence based on these infection times. That is, if t 0A < t 0B , 
we infer that host A is infected before host B. It is noted 
that the algorithm used in t25l to infer the worm infection 
sequence can be regarded as using this approach with the 
naive estimator. 

The naive estimator, however, can potentially fail to 
infer the worm infection sequence in some cases. Fig. [4] 
shows an example, where hosts A and B get infected at 
t 0A and i 0B , respectively, and t 0A < t 0B . Moreover, these 
two infected hosts have scanning rates s A < s B such that 
Darknet observes t 1A > t 1B . If the naive estimator is used, 
^oa > to B r which means that host A is incorrectly inferred 
to be infected after host B. Intuitively, if our proposed 1. We ignore the case t 0A = t 0B here. 



x > 
x < 0, 



(26) 



where A = p A or p B . 

To calculate the probability of error detection for different 
estimators, we first define a new random variable 



^ — ^0A ^013; 



(27) 



and calculate its probability density function (pdf) f z (z). 
From Equation (26), we can obtain the pdf of S' m = — 5 0B , 
which is 



/c( x ) 



p B e 
0, 



Pbx 



x < 
x > 0. 



(28) 





(a) Pr NE (error). (b) PrMME(error). 

Fig. 5. Analytical results of Pr(error) when changing p A and p B (r = 50 time units). 



Since S 0A and S' m are independent, the pdf of Z = 5 0A + S' m 
is given by the convolution of fs 0A (x) and f$> (x), i.e., 

/.+00 

/*(*) = / fs 0A (x)f s ,(z~x)dx. (29) 



For z > 0, this yields 
fz(z) 

For z < 0, we obtain 



3.2.2 Proposed Estimators 

We assume that Darknet observes a sufficient number of 
scans from hosts A and B so that our proposed estimators 
can estimate fi A (i.e., -p) and /x B (i.e., ■£-) accurately. Then, 
the probability of error detection of our proposed estima- 
tors is 



JA e~ PAX -p B e Mz - x) dx 



PAPB g-PA^ 

Pa+Pb 



-oo 

p A e- pAX -p B e Mz - x) dx 
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PaPb e p B z^ 



(30) 



Pa+Pb 



Hence, 



h{z) 



_PA£B_ -p A Z >Q 
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PaPb e p B z 
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Z < 0. 



(31) 



(32) 



Ptmme (error) = 


Pr MLE (error) = Pr LRE (error) 
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Pr(t 1A - — > iis - — ) 

V 1A p A 1B p B 1 


= 


Pr(<5 0A - S 0B > t + i - i 
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Pr(Z>r+ PB " PA ) 

V PAPB ' 




/•+OO 
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/ / z (z) dz. 

Jr I PB_PA 




PAPB 


T + PB ~ PA > 0, 
PaPb — 





(36) 



Pr v 



error) 



+ : ^o 



3.2. 7 Naive Estimator 

The naive estimator uses t = t x — 1 to estimate £ . Thus, 
the probability of error detection is 

Pr NE (error) = Pr(t 1A - 1 > t 1B - 1) = Pr(c5 0A > t + 6 0B ), (33) 

where r = i 0B — i 0A , the time interval between the infection 
of host A and host B; and r > 0. We then have 



P B -P A PA+PE 
PAPB 



"PaIt-M 



e" PAZ dz 



Pa+P 



(37) 



When r + p^^ 

PAPB 



PrMME (error) 



<0, 



ii 



-2*g- e PBZ dz + 
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PB-PA PA+PB 
PAPB 
+ 00 

p^- e~ PAZ dz 



n 



Pr NE (error) 



= Pr(5 0A - 5 0B > t) 
= Pr(Z > t) 

JM&L- e -PA* dz 
Pa+Pb 



Pa+Pb 



Pa+Pb 
Pa+Pb 



p A e 



Pi) 



( r+ £B_PA) 

V p a pb / 



.(38) 



Pa+Pb 



(34) 



Note that another way to derive Pr NE (error) is based on the 
memoryless property of the exponential distribution and 
Pr((5 0A > 5 0B ) = p B /(p A +p B ), i.e., 

Pr NE (error) = Pr(<5 0A > r + <5 0B ) = Pr((5 0A > r)Pr(<5 0A > 5 0B ), 

(35) 
which leads to the same result. 



3.2.3 Performance Comparison 

Since Pr NE (error) = Pr(Z > r) and Pr MME (error) = Pr(Z > 
t + ■ Pb ~- Pa ), for a given r (t > 0), comparing Equation 04l 
with Equations {37} and (38), 

/ Pr NE (error) > Pr MME (error), p A < p B 



Pr NE (error) < Pr MME (error), p A > p B 



(39) 



Hence, it is unclear which estimator is better based on 
the expressions of Pr NE (error) and Pr MME (error). How- 
ever, we can compare the performance of our estimators 
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(a) Pr(error) (p A = 0.02 and p B = 0.05). 

Fig. 6. Analytical results of Pr(error) when changing r. 



(b) Pr(error) (p A = 0.05 and p B = 0.02). 



with the naive estimator through numerical analysis. We 
first demonstrate the probabilities of error detection (i.e., 
Pr NE (error) and Pr MME (error)) as the functions of p A and p B 
in Fig. |5j where r = 50 time units. It can be seen that for 
the naive estimator, when host A hits the Darknet with a 
very low probability Pr NE (error) is almost 1 regardless of 
p B . However, the worst case of Pr,^ (error) is slightly above 
0.6 when p B is small. Moreover, we show the probabilities 
of error detection as a function of r with a given pair of 
p A and p B in Fig. [6] The performance of two estimators 
improves as r increases. Furthermore, the sum of the 
integral f Q Pr NE (error) dr of the two figures is 41.43, while 
the sum of the integral f Q Pr MME (error) dr in these two 
cases is only 34.76. This shows that the improvement gain 
of our estimators over the naive estimator when p A < p B 
outweighs the degradation suffered when p A > p B , indicat- 
ing the benefits of applying our estimators. 

Note that p A , p B , and r can be random variables. To 
evaluate the overall performance of each estimator, we 
consider the average probability of error detection over p A , 
p B , and r, i.e., 



E [Pr(error)] = / / / Pr(error) 

■It Jp A Jpu 



f(p A ,p B7 r) dp B dp A dr. 
(40) 



Since p A , p B , and r are independent, 

J(Pa,P b ,t) = f(p A ) ■ f(p B ) ■ /(r) 



(41) 



We then consider some cases in which we are interested 
and apply the numerical integration toolbox in Matlab l32l 
to calculate the triple integration. For example, we assume 
that s A and s B follow a normal distribution N(u, a 2 ) and r 
is uniform over (0, n]. We find that when u, a 2 , and t\ are 
set to realistic values, we always have 



E [Pr NE (error)] > E [Pr MME (error)]. 



(42) 



That is, our proposed estimators perform better than NE on 
average, which will further be verified in Section 4 through 
simulations. 

Moreover, in Fig. 5(a), it can be seen that the majority 
of detection error for the naive estimator comes from the 
case that p A < p B . Specifically, it is obvious to derive the 
following theorem from Equations (34) and (37). 



Theorem 2: When p A < p B , 

Pr MME (error) = Pr MLE (error) = Pr LRE (error) 

= Pr NE (error) -e K n> . (43) 

That is, the error probability is decreased by a factor of 

pa ) 

p b / by applying our estimators as compared with the 
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naive estimator. 

4 Simulation Results 

In this section, we use simulations to verify our analytical 
results and then apply estimators to identify the patient 
zero or the hitlist. As far as we know, there is no publicly 
available data to show the real worm infection sequence. 
That is, there is no dataset available with the real infection 
sequence to serve as the ground truth and a comparison 
basis for performance evaluation. Therefore, we apply em- 
pirical simulations to provide the simulated worm infection 
time and infection sequence. 

4.1 Estimating the Host Infection Time 

We evaluate the performance of estimators in estimating 
the host infection time. For the case of random-scanning 
worms, we simulate the behavior of a host infected by 
the Code Red v2 worm. The host is infected at time tick 
and uses a constant scanning rate. The time unit is set 
to 20 seconds. The Darknet records hit times during an 
observation window. We consider the effects of the Darknet 
size, the scanning rate, and the observation window size on 
the performance of the estimators. The results are averaged 
over 100 independent runs. Fig. [7| compares the perfor- 
mance of NE, MME, and LRE with different Darknet sizes 
from 2 18 to 2 25 , a scanning rate of 358 scans/min, and an 
observation window size of 800 mins. The three sub-figures 
show the mean of estimators for //, the mean of estimators 
for to, and the MSE of estimators for to. Fig. [8] compares 
the three estimators with different scanning rates from 158 
scans/min to 558 scans/min, a Darknet size of 2 20 , and an 
observation window size of 800 mins. Similarly, Fig. [9] is 
with different observation window sizes from 50 mins to 
800 mins, a scanning rate of 358 scans/min, and a Darknet 
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Fig. 7. Simulation results of changing the Darknet size for random scanning (all cases are for scanning rate: 358 scans/min, 
observation window size: 800 mins). 
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Fig. 8. Simulation results of changing the scanning rate for random scanning (all cases are for Darknet size: 2 20 IP 
addresses, observation window size: 800 mins). 
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Fig. 9. Simulation results of changing the observation window size for random scanning (all cases are for scanning rate: 
358 scans/min, Darknet size: 2 20 IP addresses). 



size of 2 20 . It is observed that for all cases, our proposed 
estimators have a better performance {i.e., unbiasedness and 
smaller MSE) than the naive estimator in estimating the 
host infection time. Specifically, the simulation results verify 
Theorem [TJ i.e., that the MSE of our estimators is almost 
half of that of the naive estimator, when the observation 
window size is sufficiently large {e.g., > 200 mins). 

Next, we study a host infected by localized-scanning 
worms and adopt the same simulation parameters and 
settings as the above. The main difference is that here 
the host preferentially searches for vulnerable hosts in the 



"local" address space with a probability p a . In Fig. [10] p a is 
set to 0.7, and we compare MSE(to) for different estimators. 
We find that the results are similar to those for the random- 
scanning case shown in Fig.s 1 719 1 The MSE(io) in Fig. [lOj 
however, is larger for all cases since the localized-scanning 
worm hits the Darknet less frequently than the random- 
scanning worm. In Fig. [TTJ we compare the performance 
of NE, MME, and LRE with different p a from to 0.9, 
a scanning rate of 358 scans/min, a Darknet size of 2 20 , 
and an observation window size of 800 mins. Similarly, 
the results show that our estimators are unbiased and the 
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Fig. 10. (a) Simulation results of changing the Darknet size for localized scanning (p a = 0.7, scanning rate: 358 scans/min, 
observation window size: 800 mins). (b) Simulation results of changing the scanning rate for localized scanning (p a = 0.7, 
Darknet size: 2 20 IP addresses, observation window size: 800 mins). (c) Simulation results of changing the observation 
window size for localized scanning (p a = 0.7, scanning rate: 358 scans/min, Darknet size: 2 20 IP addresses). 
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Fig. 1 1 . Simulation results of changing p a for localized scanning (all cases are for scanning rate: 358 scans/min, Darknet 
size: 2 20 IP addresses, observation window size: 800 mins). 



MSE of our estimators is almost half of that of the naive 
estimator. 

4.2 Estimating the Worm Infection Sequence 

We evaluate the performance of our algorithms in es- 
timating the worm infection sequence and simulate the 
propagation of the Code Red v2 worm. The simulator 
is extended from the code provided by l33l . where the 
parameter setting is based on the worm characteristics. The 
Code Red worm has a vulnerable population of 360,000. 
Different infected hosts may have different scanning rates. 
Thus, we assign a scanning rate (scans/min) from a normal 
distribution /V(358, a 2 ) to a newly infected host. Moreover, 
we start our simulation at time tick from one infected 
host. The time unit is set to 20 seconds. Detailed informa- 
tion about how the parameters are chosen can be found in 
Section VII of (22) , Each point in Fig.[l2]is averaged over 20 
independent runs. Table Ogives the results of a sample run 
with a Darknet size of 2 20 , an observation window size 
of 1,600 mins, and a = 110. In the table, Si is the actual 
infection sequence (i.e., Si = i), whereas Si is the estimated 
sequence. In this example, we find that MME and LRE can 
pinpoint the patient zero successfully, while NE fails. 
To compare the performance of estimators quantitatively, 



TABLE 4 
A sample run of simulations for random scanning. 



Si 


^iNE 


^iMME 


'-'iLRE 


to 


*0ne 


*0mme 


*0lre 
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114 


20 


20 
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85 


98 
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105 


165 


116 


116 


520 


498 


533 


534 


593 


622 


589 


589 


521 


433 


488 


477 


594 


611 


581 


580 



we consider a simple l\ sequence distance, i.e., 



N 



D = ^2\Si-Si 



(44) 



where N is the length of the infection sequence considered. 
Note that the smaller the sequence distance is, the better 
the estimator performance will be. Fig. E3a) shows the 
sequence distances of NE, MME, and LRE with varying 
Darknet sizes from 2 19 to 2 24 , an observation window size 
of 1,600 mins, N = 1,000, and a = 115. It is observed that 
when the Darknet size increases, the performance of all esti- 
mators improves dramatically. Moreover, the performance 
of MME and LRE is always better than that of NE. For 
example, when the Darknet size equals 2 19 , MME and LRE 
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Fig. 12. Simulation results of the sequence distance for random scanning, (a) Changing the Darknet size [N = 1,000, 
observation window size: 1 ,600 mins, scanning rate: 7V(358, 115 2 )) . (b) Changing the scanning rate standard deviation (N 
= 1,000, observation window size: 1,600 mins, Darknet size: 2 20 IP addresses), (c) Changing the length of the infection 
sequence considered (observation window size: 1,600 mins, Darknet size: 2 20 IP addresses, scanning rate: iV(358, 115 2 )). 
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Fig. 13. Simulation results of the sequence distance for localized scanning, (a) Changing the scanning rate standard 
deviation (p a = 0.7, N = 1,000, observation window size: 1,000 mins, Darknet size: 2 24 IP addresses), (b) Changing the 
length of the infection sequence considered (p a = 0.7, observation window size: 1 ,000 mins, Darknet size: 2 24 IP addresses, 
scanning rate: 7V(358, 115 2 )). (c) Changing the p a (N = 1,000, observation window size: 1,000 mins, Darknet size: 2 24 IP 
addresses, scanning rate: iV(358, 115 2 )). 



improve the inference accuracy by 24%, compared with 
NE. Fig. fT^T bl demonstrates the sequence distances of these 
three estimators by changing the standard deviation of the 
scanning rate {i.e., a) from 100 to 125. In the figure, the 
Darknet size is 2 20 , the observation window size is 1,600 
mins, and N = 1,000. It is noted that when a increases, the 
performance of all estimators deteriorates. The performance 
of MME and LRE, however, is always better than that of 
NE. For example, when a = 120, MME and LRE reduce the 
sequence distance by 30%, compared with NE. In Fig. [l2l c), 
we increase the length of the infection sequence considered, 
N, from 1,000 to 11,000. Here the Darknet size is 2 20 , the 
observation window size is 1,600 mins, and a = 115. It 
is intuitive that the sequence distances of all estimators 
become larger as N increases. However, MME and LRE 
are always better than NE. 

Next, we extend our simulator to imitate the spread 
of localized-scanning worms. Specifically, we consider /8 
localized-scanning worms and a centralized /8 Darknet 
with 2 24 IP addresses. We still use the Code Red v2 worm 
parameters and the same setting as random scanning, 
except that the observation window size is 1,000 mins 



(this is because localized-scanning worms spread faster). 
The distribution of vulnerable hosts is extracted from the 
dataset provided by DShield |34|. DShield obtains the 
information of vulnerable hosts by aggregating logs from 
more than 1,600 intrusion detection systems distributed 
throughout the Internet. Specifically, we use the dataset 
with port 80 (HTTP) that is exploited by the Code Red v2 
worm to generate the vulnerable-hosts distribution. Each 
point in Fig. [l3]is averaged over 20 independent runs. Fig. 
[131 compares the sequence distances of different estimators 
for localized scanning. Specifically, the results in Fig. IT3l a) 
and (b) are similar to those in Fig. [T2lb ) and (c). In Fig. 
[LSl c), we compare the performance of the estimators by 
increasing p a from to 0.7. Here, N = 1,000, and a = 115. 
It is observed that the sequence distances of all estimators 
increase as p a becomes larger. However, our estimators are 
always better than NE. For example, when p a = 0.5, MME 
and LRE increase the inference accuracy by 27%, compared 
with NE. 

Therefore, our proposed estimators perform much better 
than the naive estimator for both random-scanning and 
localized-scanning worms in estimating the worm infection 
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Fig. 14. Comparison of estimators when changing the hitlist size, (a) Random scanning (all cases are for Darknet size: 
2 20 IP addresses, observation window size: 1000 mins, hitlist hosts scanning rate: iV(50, 20 2 ), other hosts scanning rate: 
iV(358, HO 2 )), (b) Localized scanning (all cases are forp a = 0.7, Darknet size: 2 24 IP addresses, observation window size: 
1 000 mins, hitlist hosts scanning rate: N(b0, 20 2 ), other hosts scanning rate: N(358, HO 2 )). 



sequence. 

4.3 Identifying the Patient Zero or the Hitlist 

As discussed in Section 1, a smart worm can assign lower 
scanning rates to the initially infected host(s) and higher 
scanning rates to other infected hosts. In this way, the 
Darknet might observe later infected hosts first, and there- 
fore the smart worm would weaken the performance of 
the naive estimator. In Fig. [14J we compare the perfor- 
mance of estimators in identifying the hitlist of such a 
smart worm. Specifically, the worm assigns scanning rates 
from N(50, 20 2 ) to the host(s) on the hitlist and scanning 
rates from 7V(358, 110 2 ) to other infected hosts. Then, we 
calculate the percentage of the host(s) on the hitlist that are 
successfully identified by an estimator. For example, if the 
size of the hitlist is 100 and 50 hosts that belong to the hitlist 
are identified among the first 100 hosts of the estimated 
infection sequence, the successful identification percentage 
of the estimator is 50%. The results are averaged over 100 
independent runs. Fig. [TH a) shows the case of random 
scanning, where the Darkent size is 2 20 and the observation 
window size is 1,000 mins. It is seen that our estimators 
have a higher successful identification percentage and a 
smaller variance than the naive estimator. For instance, 
when the size of the hitlist is 1 (i.e., the worm starts from the 
patient zero), MME and LRE can pinpoint the patient zero 
around 80% of the time, while NE can detect it only 70% of 
the time. When the size of the hitlist is 10 or 100, compared 
with NE, our proposed estimators increase the number of 
successfully identified hosts from 5 to 7 or 51 to 72, and 
reduce the variance from 2.6 to 1.6 or 23 to 13, respectively. 
Fig. HUb) shows the results of localized scanning, where the 
Darkent size is 2 24 and p a = 0.7, and all other parameters 
are the same as the case of random scanning. The results 
are similar to those in Fig. HUa). Therefore, the simulation 
results demonstrate that our proposed estimators are much 



more effective in identifying the histlist of the smart worm 
than the naive estimator. 

5 Discussions 

In this section, we first analyze the chance that Darknet 
misses an infected host and then discuss the limitations 
and the extensions of our proposed estimators. 

5.1 Host Missing Probability 

By applying Darknet observations, we have made an as- 
sumption: The infected host will hit the Darknet. Then, an 
intuitive question would be: What is the probability that the 
Darknet misses an infected host within a given observation 
window? 

We consider the case of localized scanning and regard 
random scanning as a special case of localized scanning 
when p a = 0. The probability for a scan from an infected 
host to hit the Darknet is (1 — p a ) ■ oj/VL; and then the 
probability that the Darknet misses observing the host in a 
time unit is (1 — (1 — p a ) ■ u)/tt) s . Thus, the host missing 
probability (i.e., the probability that the Darknet misses the 
infected host in a k time units observation window) is 



1 






(45) 



Pr LS (missing) 

In Fig. [15J we show the host missing probability as the 
observation window size changes. In this example, we set 
u! = 2 24 , time unit = 20 seconds, and s = 358 scans /min. 
We find that if p a = 0.7, the infected host will almost hit 
the Darknet for sure when the observation window size 
is larger than 20 mins. If p a = 0, which is the case of 
random scanning, a 5-min observation window is sufficient 
to guarantee the capture of the infected host. Therefore, 
in our previous analysis and simulation, the assumption 
that the Darknet can observe scans from the infected host, 
especially at the early stage, is reasonable. Moreover, our 
estimator can still work even for self-stopping worms l35l . 
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Fig. 15. Host missing probability (p a = 0.7, Darknet size: 2 24 
IP addresses, scanning rate: 358 scans/min). 

5.2 Estimator Limitations and Extensions 

Our proposed estimators are built based on some assump- 
tions listed in Section^ Attackers that design future worms 
may exploit these assumptions to weaken the accuracy of 
our estimators. In the following, we discuss some limita- 
tions of our estimators and the potential extensions. 

5.2. 1 Darknet Avoidance 

The majority of active worms up to date do not attempt 
to avoid the detection of Darknet. As a result, CAIDA's 
network telescopes have been observing many active Inter- 
net worms such as Code Red, Slammer, Witty, and even 
recently the Conficker worm (also known as the April 
Fool's worm). Most worms apply random scanning and 
localized scanning, and Darknet can observe the traffic from 
such worms. 

Recent work, however, has shown that attackers can 
potentially detect the locations of Darknet or network 
sensors |36l . Thus, a future worm can be specially designed 
to avoid scanning the address space of the Darknet. The 
countermeasure against such an intelligent worm is to 
apply the distributed Darknet instead of the centralized 
Darknet l23l . That is, unused IP addresses in many subnets 
are used to observe worm traffic, which is then reported to 
a collection center for further processing. A prototype of 
distributed Darknet has been designed and evaluated in 

m 



5.2.2 Scanning Rate Variation 

Although there have been no observations of worms that 
use scanning rate variation mechanisms {i.e., the scanning 
rate of an individual infected host is time-variant) l29l , 
future worms may employ such schemes to invalidate 
our basic assumption and thus weaken the performance 
of our estimators. Changing the scanning rate, however, 
introduces additional complexity to worm design and can 
slow down worm spreading. Moreover, if the change of 
scanning rates is relatively slow, our estimators can be 
enhanced with the change-point detection [38 [ to detect and 
track when the scanning rate has a significant change and 
then apply the early observations to derive the infection 
time of an infected host. 
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Fig. 16. Simulation results of the sequence distances of 
different estimators varying with the worm packet loss rate. 
(N = 1,000, observation window size: 1,600 mins, scanning 
rate: 7V(358, 115 2 ), Darknet size: 2 20 IP addresses). 



5.2.3 Measurement Errors 

The measurement errors can affect the performance of 
estimators. There are two types of measurement errors. The 
false positive denotes that Darknet incorrectly classifies the 
traffic from a benign host as worm traffic, whereas the false 
negative is that Darknet incorrectly classifies worm traffic 
as benign traffic or misses worm traffic due to congestion 
or device malfunction. 

For the false positives, most of time we can distinguish 
worm traffic from other traffic. First, our estimation tech- 
niques are used as a form of post-mortem analysis on 
worm records logged by Darknet. As a result, we can limit 
our analysis to the records logged during the outbreak 
of the worm when it is most rampant. More importantly, 
worm packages always contain information about infection 
vectors that distinguish worm traffic from other traffic. For 
example, the Witty worm uses a source port of 4,000 to 
attack Internet Security Systems firewall products [16]. It is 
very unlikely that a benign host uses a source port of 4,000. 
By filtering the records based on infection vectors specific 
to the worm under investigation, we can eliminate most of 
the effects of false positives on Darknet observations. 

False negatives are much harder to eliminate. A packet 
towards Darknet may be lost due to congestion caused by 
the worm (such as the Slammer worm (l)) or the malfunc- 
tion of Darknet monitoring devices. To study the effects 
of false negatives, we modify our simulator to mimic the 
packet loss and evaluate the performance of our estimators 
under false negatives. Here we assume that the loss rate 
of the worm packets towards Darknet (denoted as n oss ) is 
the same for each infected host. Fig. [16] shows how the 
sequence distances of different estimators vary with the 
worm packet loss rate. The results are averaged over 20 
independent runs. It is intuitive that when the packet loss 
rate becomes larger, the performance of all estimators wors- 
ens. Our proposed estimators, however, always perform 
much better than NE. For example, compared with NE, 
our estimators (i.e., MME and LRE) improve the inference 
accuracy by 28% when ri oss = 0.4. A mechanism to recover 
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from worm-induced congestion has been proposed in 1291 , 
which estimates the packet loss rates of infected hosts based 
on Darknet observations and BGP atoms. This method 
can be incorporated into our estimators to enhance their 
robustness against worm-induced congestion. 

6 Related Work 

Under the framework of Internet worm tomography, sev- 
eral works have applied Darknet observations to infer the 
characteristics of worms. For example, Chen et al. studied 
how the Darknet can be used to monitor, detect, and defend 
against Internet worms fPTI . Moore et al. applied network 
telescope observations and least squares fitting methods to 
infer the number of infected hosts and scanning rates of 
infected hosts l23l . Some works have researched on how 
to use Darknet observations to detect the appearance of 
worms fl8l , 1221 , 1201 , (2TJ. For instance, Zou et al. used a 
Kalman filter to infer the infection rate of a worm and then 
detect the worm l22l . Moreover, the Darknet observations 
have been used to study the feature of a specific worm, 
such as Code Red fTBI , Slammer |1|, and Witty (161 . 

Internet worm tomography has been applied to infer 
worm temporal behaviors. For example, Kumar et al. used 
network telescope data and analyzed the pseudo-random 
number generator to reconstruct the "who infected whom" 
infection tree of the Witty worm |24J. Hamadeh et al. further 
described a general framework to recover the infection 
sequence for both TCP and UDP scanning worms from 
network telescope data t39l . Rajab et al. applied the same 
data and studied the "infection and detection times" to 
infer the worm infection sequence E5I . Different from the 
above works, in this work we employ advanced statistical 
estimation techniques to Internet worm tomography. 

7 Conclusions 

In this paper, we have attempted to understand the tempo- 
ral characteristics of Internet worms through both analysis 
and simulation under the framework of Internet worm 
tomography. Specifically, we have proposed method of 
moments, maximum likelihood, and linear regression es- 
timators to infer the host infection time and reconstruct the 
worm infection sequence. We have shown analytically and 
empirically that the mean squared error of our proposed 
estimators can be almost half of that of the naive estimator 
in estimating the host infection time. Moreover, we have 
formulated the problem of estimating the worm infection 
sequence as a detection problem and have calculated the 
probability of error detection for different estimators. We 
have demonstrated empirically that our estimation tech- 
niques perform much better than the algorithm used in 1251 
in estimating the worm infection sequence and in identi- 
fying the hitlist for both random-scanning and localized- 
scanning worms. 

Appendix A 

Table [21 Estimator Properties (p) 

We calculate the bias, the variance, and the MSE of different 
estimators for estimating fi. 



A.1 Naive Estimator 

Since /t NE = 1, the bias of NE is 

Bias(/i NE ) = E(/i NE ) - n = 1 - |. (A.46) 

Note that fi UE is constant. Thus, the variance of NE is 

Var(A NE ) = E [(A NE - E(/2 NE )) 2 ] = 0. (A.47) 

Therefore, 

MSE(/i NE ) = Bias 2 (A NE ) + Var(A NE ) = £j$£. (A.48) 

A.2 Method of Moments Estimator / Maximum Likeli- 
hood Estimator 

Since E(<5j) = fj, for i = 1, 2, • • • , n — 1 and Equations © 
and QD hold, the bias of jl MME (or /i MLE ) is calculated as 



Bias(/i MME ) = E^^y J2 6 t ) - \i = 0, 



n-l 

E 

i=l 



(A.49) 



which is unbiased. Note that Var(<5i) = l—£ for i = 
1,2, ••■ ,n — 1 and 6/s are independent. Thus, we have 



(it — ± \ 
d* E Si) = -^ 



i=l 



^=ry 



(A.50) 



Therefore, the MSE of /i MME (or fi MLE ) is 

MSE(/i MME ) = Bias 2 (/iMM E ) + Var(/i MME ) = p2 |~^ 1) ■ (A.51) 

It is noted that for an unbiased estimator, the MSE is 
identical to its variance. 

A.3 Linear Regression Estimator 

Note that A LRE = J2=p 



From Equation <f2CTb and ti = to 



Ej=o &i> ' = 1) 2 > " ' ,n,we have 



i-t = 



1 ™ 

n *■ — ' 



2 u ' n 



-. -i n-l n 

n + l ^ (n-i){n + i + l) 
—*« + L Yn 6i (A - 52) 

i=0 



and 



n-l 



i ■ t = i ■ - V^tj = i ■ t + % ■ V* 6i. (A.53) 

71 *- J *- J Ti 



i=Q 



Since i = n±l and z 2 = ( " +1) ' 2w+1) , 

2 6 ; 



n— 1 ./ _ .\ 



and 



2n 



^2 n 2 -l 



(") 
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(A.55) 
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Note that E(6i) = fi and Var(<5j) = ^—*r, i = 0, 1, • • ■ , n — 1, Since /i MME = ;^ryX!™=i 1 ^ tnat * s independent of S , 



and 5i's are independent. Moreover, 2~27=i ' 



andE" = i« 4 = M( 6n5 + 15n ' 

n — 1 . / 
_ _ x — - i \ n 

E(i-t- i ■ t) = 22 



nQi+l) V 
2 ) 



10n — n). Then, we have 



n 2 - 1 



and 



2» 



-M = 



12 



(A.56) 



MSE(£ 0mme ) = Var(to MME ) 

= Var(<5 ) +Var(/}MM E ) 



1— p _ n 
p 2 n—X ' 



based on Equation dA.50b and Var(6o) 
when n > 1, MSE(£ 0mme ) « ^E. 



i-p 
p 2 



(B.65) 
Note that 



n— 1 / ./ -\ \ 2 -. 

Var(i -t-i-t) = 22' 



i=i 



2ii 



p- 



1 ~P I 2 



n — 1 n— 1 n — 1 \ 

i=l i=l i=l / 



4n 2 p 2 

1 — p n 4 — 1 
~p2 120n ' 

Therefore, the bias of /) LRE can be calculated as 



Bias(/i LRE ) = E 



- /! = 0, 



(A.57) 



(A.58) 



which is unbiased. Moreover, the variance and the MSE of 



MSE(/t LRE ) = Var(/x LRE ) 



= Var 






2-(l)2 

6(n a +l)(l-p) 

5n(n 2 — l)p 2 * 



Appendix B 

Table St Estimator Properties (4) 

We calculate the bias, the variance, and the MSE of different 
estimators for estimating t . 

B.1 Naive Estimator 

Since i NE =*i~Anb = t + S -l,E(So) = ~, and Var(<5 ) = 

i-p 

p 2 ' 

Bias(f 0NE ) t + E(* ) - 1 - to = ^ (B.60) 

(B.61) 

(B.62) 



i-p 



Var(toNE) = Var(t +(5 -l) 
MSE(to NE ) = Bias 2 (f 0NE )+Var(f 0NE ) 

_ (l-p)(2-p) 



Note that when p<l, MSE(£ 0n 



2(l-p) 
p 2 



B.2 Method of Moments Estimator / Maximum Likeli- 
hood Estimator 

Note that io MME = *0mle = *0 + So - Ammb and E(S ) = 
E(Amme) = M- Thus, 

Bias(t 0MME ) = t + E(S ) - E(/i MMB ) - i = (B.63) 



B.3 Linear Regression Estimator 

Since io LRE = *0 + #0 - Aim and E(<5 ) = E(/2 LRE ) = /i, 

Bias(to LRE ) = t Q + E(S Q ) - E(A LRE ) - t Q = (B.66) 

MSE(*oub) = Var(t 0LRE ) = Var(5 - fi LRE ). (B.67) 

Note that from Equations jA.54| | and JA.55I I, /t LRE = 

^n— 1 i(n — i) 



j J2i=i 2n h mat * s independent of (5 . Hence, 



MSE(to LRE ) 



Var(io LRE ) 

Var(5 ) + Var(/i LRE ) 



= ^ • 5 " + 6 ,"" a ~°? +6 , (B.68) 

p 2 5n(n 2 — 1) ' v ' 

based on Equation lA.59b and Var(<5o) = -j?. Note that 
when n > 1, MSE(io LRE ) « i^E. 
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